Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(source-amazon-ads): Configure max concurrent async job count #55745

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

pnilan
Copy link
Contributor

@pnilan pnilan commented Mar 13, 2025

What

  • Sets max_concurrent_async_job_count to 10
  • To conform w/ airbyte-cdk v6.36.3
    • Updates urls_extractor to download_target_extractor.
    • Updates interpolation references to stream_slice["create_job_response"] to creation_response.
    • Updates interpolation references to stream_slice.extra_fields["url"] to download_target.

Note

Each asynchronous API operation supports a maximum number of requests per second (per region, per account), and also a maximum of 10 active jobs.

Amazon API Docs

User Impact

  • Report streams theoretically should be 10x faster

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

Copy link

vercel bot commented Mar 13, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 14, 2025 5:55pm

Copy link
Contributor

@aldogonzalez8 aldogonzalez8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVED

@dbgold17
Copy link
Contributor

dbgold17 commented Mar 13, 2025

@pnilan how did you choose to set concurrent async job count to 10 vs say 2 like you did here?

@pnilan
Copy link
Contributor Author

pnilan commented Mar 13, 2025

@dbgold17

Each asynchronous API operation supports a maximum number of requests per second (per region, per account), and also a maximum of 10 active jobs.

Amazon API Docs

@maxi297
Copy link
Contributor

maxi297 commented Mar 14, 2025

I'm very hyped for this. Can we take a couple of syncs that are long and see how much it improves the performance?

@@ -7,6 +7,8 @@ check:
stream_names:
- streams

max_concurrent_async_job_count: 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we make this configurable? I fear some users might re-use their creds for other processes or even for multiple syncs hence making this configurable would give us an escape patch in those cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this for a while, but this is the best justification I've seen for making it configurable. I'll create a follow-up issue for this.

@natikgadzhi
Copy link
Contributor

Unit tests borked?

@pnilan
Copy link
Contributor Author

pnilan commented Mar 17, 2025

@natikgadzhi (cc: @maxi297 @bazarnov )

There was a change in the CDK that affects this connector. Specifically create_job_response requests.Response object is no longer available in the interpolation context so the polling requester loses context on the initial job creation request headers. Maybe I'm having a brain fart but I don't think it's straightforward to remedy this without another CDK change.

Amazon-Advertising-API-Scope: "{{ stream_slice['create_job_response'].request.headers['Amazon-Advertising-API-Scope'] }}"

@maxi297
Copy link
Contributor

maxi297 commented Mar 18, 2025

The changes needed to resolve this were documented as part of this ticket and @topefolorunso is assigned to this issue. I'm not sure what is the progress on this though

@topefolorunso
Copy link
Collaborator

topefolorunso commented Mar 18, 2025

@maxi297 @pnilan See PR raised here #55806

@bazarnov
Copy link
Collaborator

@natikgadzhi (cc: @maxi297 @bazarnov )

There was a change in the CDK that affects this connector. Specifically create_job_response requests.Response object is no longer available in the interpolation context so the polling requester loses context on the initial job creation request headers. Maybe I'm having a brain fart but I don't think it's straightforward to remedy this without another CDK change.

Amazon-Advertising-API-Scope: "{{ stream_slice['create_job_response'].request.headers['Amazon-Advertising-API-Scope'] }}"

@pnilan @maxi297 This is something I've missed during the adding more interpolation context to the AsyncRetriever.
This is fixed in this PR, please consider updating the source to use the latest version of the CDK once it's released. Thank you.

The update for the manifest would look like this:

Amazon-Advertising-API-Scope: "{{ creation_response['headers]['Amazon-Advertising-API-Scope'] }}"

The same properties would be available for the polling_response interpolation context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants